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ABSTRACT 

The human visual system has the ability to utilize motion information to infer the shapes 
of surfaces. More specifically, we are able to derive descriptions of rigidly rotating smooth 
surfaces entirely from the orthographic projection of the motions of their surface mark¬ 
ings. 

A computational analysis of this ability is proposed based on a “shape from motion” proposition. 
This proposition states that given the first spatial derivatives of the orthographically projected 
velocity and acceleration fields of a rigidly rotating regular surface, then the angular velocity 
and the surface normal at each visible point on that surface are uniquely determined up to a 
reflection. 

The computational analysis proceeds in three main steps. First it is shown that surface tilt and 
one component of the angular velocity may be obtained entirely from the first spatial derivatives of 
the velocity field. Second it is shown that surface slant and the remaining two components of the 
angular velocity are computable if the first spatial derivatives of the acceleration field arc also given. 
Finally the problem of constructing a velocity field from the temporally changing optic array is briefly 
discussed. 
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1. Introduction 

Visual motion provides a powerful base for inferences about the layout of the immediate environ¬ 
ment and the motions of the various constituents of that environment. The focus of this paper is 
one inference that the human visual system does appear to perform routinely based on visual motion 
alone. In particular, the human visual system has a remarkable ability to utilize motion information to 
infer the three dimensional shapes of surfaces. More specifically, we arc able to derive correct descrip¬ 
tions of rigidly rotating smooth surfaces entirely from die orthographic projection of the motions of 
their surface markings. 

A demonstration of tliis ability, similar to that of Ullman (1979), is illustrated in figure 1. Dots 
are randomly placed on a sphere in the memory of a computer. Successive snapshots of this random 
dot sphere are generated at five degree intervals and orthographically projected in quick temporal 
succession (using an IS1 of 20 msec and a presentation time per frame of 20 msec) on a computer 
driven crt. Figure la shows dirce successive frames as they would appear statically on the crt. As 
is obvious from the figure each individual frame gives no impression of being a sphere. 1 Rather it 
just looks like a somewhat circular array of random dots. However, when the frames are presented 
in quick temporal succession one obtains a compelling perception of a smooth sphere in rotation (see 
figure lb). 

It is important to note that the perception is of a smooth spherical surface, not, for example, of 
invisible wires connecting the individual dots as in Johannson’s biological motion (Johannson, 1973). 
One has die feeling that there is an almost tangible smooth black pearl with little lights attached to 
its suiface, (he importance of noting this is dial it indicates the type of description that appears to 
be built by did visual system. It is a description whose primitives relate to surfaces rather than to 
positions of isolated points. 2 

That this visual ability is a nontrivial feat becomes apparent when it is realized tiiat the mapping 
from the cnviionmcnt onto die retina is many to one. 1 he information available to die visual system 
undeidetermines die surface which is the source of the motion observed, so that any conclusions 
drawn about that surface are in principle nondemonstrative. Yet, surprisingly, our perception is, in 
general, of a unique surface in rotation. More surprisingly, it is more often dian not correct. Clearly 
the visual system must be utilizing generally valid constraints about the nature of surfaces and objects 
in oui world in order to obtain this unique solution. One constraint of central importance in obtaining 
a unique surface is the rigidity constraint; the environment is usually, though not always, composed of 
rigid objects (Ullman, 1979; Johansson, 1964 & 1975; Hay, 1966; Green 1961; Wallach & O’Connell, 
1953; Gibson & Gibson, 1957). Later this constraint will be given a precise mathematical formulation 
and its utility in arriving at a unique interpretation clearly illustrated. 

The goal of this paper is to provide a description of this perceptual ability at a level which Marr 
and Poggio have called a computational theory (Marr & Poggio, 1977). The computational analysis 
proposed is based on a “shape from motion” proposition 3 which states that given the first spatial 

This eliminates single frame information such as texture gradients from being a plausible explanation of this ability. 

2 This does not discount the possibility, of course, that positions of points might be computed first and smooth surfaces 
fitted through them afterwards. In fact, just such a scheme appears to be utilized in stereo vision (Grimson, 1980). 

3 Ihe term “proposition” is not intended to imply any hubristic claims regarding the complexity of this result or its 
derivation. Rather it is intended to emphasize that the present inquiry is a computational analysis. 




Figure 1. (a) Three successive frames of a rotating random dot sphere. Fach frame is rotated five 
degrees to die right about the vertical axis with respect to die previous frame, (b) The resulting 
spherical percept when the frames arc presented in quick succession. 
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surface 



Figure 2. Surface representations using slant, o, and tilt, r. Rather than representing the surface normal at a 

point in terms of suiface gradients (z x and Zy) it is convenient to adopt the slant and tilt convention proposed 

by both Stevens (1980) and Attneave (1972). Briefly, tilt indicates in which direction a surface is rotated from 

the observer’s frontal plane and slant indicates how much it is rotated away from the frontal plane in that 

direction. Whereas surface gradients tend to infinity at occluding contours, tilt ranges only between 0 and 180 

degrees and slant ranges from 0 to 90 degrees. The equations of transformation are a — tan -1 . /z l - 1 - z 2 
- - v x ' y 

and r = tan -1 \Jz v lz x . 


derivatives of the orthographically projected velocity and acceleration fields of a rigidly rotating 
regular surface, its angular velocity and the surface normal at each visible point on die surface are 
uniquely determined up to a reflection about the image plane. 

for clarity die computational analysis is presented in three main steps. First it is shown diat surface 
tilt (see figure 2) and one component of the angular velocity may be obtained from the first spatial 
derivatives of the velocity field. 1 hen it is shown that surface slant and the remaining two components 
of die angular velocity are computable if the first spatial derivatives of the acceleration field are also 
given. Finally, since die computational analysis assumes as one of its givens a velocity field, the 
problem of constructing a velocity field from the temporally changing optic array is discussed briefly. 
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2. Two Previous Computational Analyses 

The ability of the human visual system to infer the correct three dimensional description of an ob¬ 
ject from its projected motion alone has been investigated computationally several times before. Two 
of these previous analyses will be briefly discussed to illustrate tire two basic types of computational 
approaches that can be taken to this problem and the two basic types of resulting descriptions. 

Ullman (1979) took what may be called the “discrete approach” to the problem. The givens 
for his computational analysis are three successive snapshots of isolated points moving in a rigid 
configuration. The resulting description he builds is essentially a set of triples giving the three dimen¬ 
sional positions of the points in relation to each other. Fundamental to Ullman’s elegant analysis is his 
“structure from motion” theorem which states that the structure of four non-coplanar points in a rigid 
configuration is recoverable from three orthographic projections. 

An example of the “continuous approach” to the problem can be found in I.onguet-Higgins and 
Prazdny (1980). 4 Rather than utilizing discrete orthographic projections as input, they assume a 
velocity field arising from a perspective projection. The resulting description computed involves sur¬ 
faces instead of sets of triples. In short they prove that given the perspective projection and first and 
second spatial derivatives of the velocity field presented to a moving observer it is in principle possible 
to compute both the observer’s motion and tire surface gradients at each point in the visual field. 

The present analysis falls into the continuous category. Flow fields are assumed as the input and a 
description of the surface of interest in terms of the surface normal (slant, tilt) at each visible point is 
the (Jeered result. Where the current analysis differs from that of Longuet-IIiggins and Prazdny and 
other previous work within the continuous approach is that here orthographic projection is assumed 
instead of perspective projection. Consequently in this analysis it proves impossible to derive both 
the observer’s motion and a complete surface description merely from the velocity field and its spatial 
derivatives. The relations of these various approaches is summarized in figure 3. 


3. Why Use Orthographic Projection? 

Why bother performing a computational analysis of the problem assuming orthographic 
projection? After all it will be shown that less information about local surface properties can be 
computed from the velocity field in orthographic projection than in perspective. Specifically, surface 
slant computation requires the temporal derivative of the velocity field. There are several motivations. 

First, as Ullman (1979) points out, perspective effects arc often rather noisy and unreliable. To 
utilize them locally would require very careful measurements by the visual system. 

Second, orthographic projection provides a good local approximation to the actual retinal projec¬ 
tion. A theorem from differential topology allows us to conclude that whatever the true retinal 

^Several other researchers have examined aspects of this problem from a continuous point of view (Koenderink & Van 
Doom, 1976; Nakayama & Ioomis, 1974; Gibson, 1950). 
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PROJECTION 


ORTHOGRAPHIC PERSPECTIVE 



Figure 3. A categorization of the various computational approaches to the problem of deriving shape from 
motion. The categorization scheme is given by crossing projection type (orthographic or perspective) with 
motion type (discrete frames versus optical flow). 

projection is, it is locally equivalent to orthographic projection. 5 

A third motivation is provided by the results of some psychophysical tests done by Ullman. Using a 
cylinder composed of random dots he showed that observers can recover the correct structure entirely 
from the orthographic projection of the motion of the dots when the cylinder is rotated about its 
axis. However observers cannot recover the structure under perspective projection when the object 
is alternately receding and approaching without any rotation. This is significant because a computa¬ 
tional analysis shows that if perspective effects are taken into account the structure can in principle be 
recovered from receding and approaching motion alone. These results tend to support tire psychologi¬ 
cal reality of a computational theory based on a locally orthographic projection for the recovery of 
shape from modon. 

Alternate computational analyses provide clear candidate hypotheses that may be tested for their 
psychological reality and that each lend different insights into the subject of study. For example it 
will be shown later Uiat the tilt component of the surface normal is much more easily recovered than 
the slant component, both in the nature of the motion information required and the computations in- 

5 The theorem is called the Local Submersion Theorem (see, for example, Guillemin & Pollack (1974)). It states, “Suppose 
that f:X Y is a submersion at x, and y — f(x). Then there exists local coordinates around .r and y such that for 
k>l, f(x\,..Xk) = (*i,.. xi). That is, / is locally equivalent to the canonical submersion near x." 
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volved. This is an interesting result and one that could provide a basis for psychophysical examination 
of the psychological reality of this analysis. 

Finally, the equations for surface orientation and motion derived using orthographic projection 
are much simpler than those derived under perspective projection. Not only are the the equations 
simpler, they do not require measurements of the second spatial derivatives of the velocity field as is 
typical in the perspective case. 


4. Geometrical Model 

The idealized geometry underlying the following computational analysis is illustrated in figure 4. A 
rigid patch of surface, 5, is considered to be an open set of points each of which has an associated 
position vector R. The position vector for a point on S with respect to the x, y. z coordinate system is 
given by: 

R = xi + y] + z(x, y)k (1) 

where i, j, k are unit vectors along the x, y, z axes respectively. The surface, S, has an angular velocity 
ft given by: 

ft = wji -j- u^j + ct^k (2) 

with respect to the x, y, z coordinate system. 

Note that ft may either result from rotary motions of the surface or from movement of the image 
plane, I, with respect to S or both. As long as fi is not zero it doesn’t matter whether the surface is 
rotating and the observer remains stationary or whether the surface is stationary and the observer’s 
motion with respect to the surface includes an angular component. 

Associated with S' is a velocity vector field, V, which at any point pe S is given by: 

V = fiXR + T (3) 

where T is any net translation between the observer and the surface. 6 

The velocity field available to the observer is an orthographic (parallel) projection of the velocity 
field, V, associated with S onto the image plane, I. 

Now this is clearly an idealization. The real observer is definitely not given a velocity field but 
must construct such a field from the temporally changing optic array. This problem will be discussed 
briefly later. For the analysis of the present problem of inferring the shape of S, the orthographically 
projected velocity field is assumed as a given. 

With this simple geometrical model as background the computational analysis of the problem of 
inferring shape from orthographically projected motion is now presented as tire proof to the following 
proposition. 

6 Actually T is any net translation between the observer and the axis of rotation of the surface. However, the translation 
term is of no consequence for the present analysis since it will drop out when the spatial derivatives of (3) are taken. 
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> 


Figure 4. Geometrical model underlying the computational analysis. S is some surface with angular 
velocity fi. The resulting velocity field V is orthographically projected onto the image plane/. 
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5. The Shape From Motion Proposition 

Given the first spatial derivatives of the orthographically projected velocity and acceleration fields of 
a rigidly rotating regular surface, the angular velocity and surface normal at each visible point on the 
surface are uniquely determined up to a reflection about the image plane. 

Proof The proof of diis proposition involves deriving equations for the two components ( a , r) of 
the surface normal, N, at each visible point and for the three components of the angular velocity (u/i, 
o> 2 , lo^). For clarity of presentation the proof is divided into two lemmas. In the first lemma equations 
for the tilt, r, and one component of the angular velocity, u> 3 , are derived and discussed. In the second 
lemma the same is done for the slant, o , and the remaining two components of the angular velocity, 
loi and u) 2 - 


5.1 Lemma 1. 

Both the lilt, r, at each visible point on S and the component of angular velocity about the axis 
orthogonal to the image plane, L 03 , are computable given only the first spatial derivatives of the or¬ 
thographically projected velocity field. 

To make the claims of this lemma clearer figure 5 illustrates the tilt fields associated with two 
simple surfaces and figure 4 illustrates with which axis the angular velocity component, 0 ) 3 , is as¬ 
sociated. 

Proof of Lemma I. Since the projection plane, 7, is orthogonal to the unit vector, k, the or¬ 
thographic projection of the velocity field, V*, is given by: 7 

v* = V — (V • k)k (4) 

What this essentially means is that the components of the velocity field along the x and y axes survive 
orthographic projection unaltered, whereas the component along the z axis (i.e., along the observer’s 
line of sight) is eliminated completely. Consequently the only spatial derivatives of the velocity field 
that need be computed are along the x and y directions. Denoting spatial partial derivatives by 
subscripts, the first spatial derivatives of the velocity field (equation 3) along the x and y axes are: 


v x = n x x r + n x r* 

(5) 

V y = fiy X R-f fi X Ry 

(6) 


Before investigating equations (5) and (6) further it is helpful to introduce a mathematical expres¬ 
sion for tlic rigidity constraint that will allow these equations to be simplified. The motivation for the 
particular mathematical expression to be used here is simple. One consequence of surface rigidity is 
that the entire surface can have only one angular velocity, Q. Regardless of which neighborhood of 

7 This characterization of the orthographic projection of a vector is borrowed from Wilkin (1980). 
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Figure 5. Tilt fields compared with fields of surface normals for two surfaces. According to lemma 
1 one can obtain the correct tilt field (1, r) from the velocity field but, unfortunately, not the field of 
surface normals (<r, r). A tilt field is an example of a field of directions (Do Carmo, 1976, p.178). Since 
no magnitude information is known, only the direction of tilt, the tilt fields in (a) and (b) arc indicated 
by constant length vectors pointing in the direction of tilt. The surfaces are (a) a sphere and (b) a 
cylinder. 
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5.4 Remarks on the Shape From Motion Proposition 

It has been shown that the angular velocity and the surface normal at each visible point of a rigidly 
rotating regular surface are uniquely determined up to a reflection if one is given the orthographic 
projection and first spatial derivatives of the associated velocity and acceleration fields. This proposi¬ 
tion and its proof are proposed as the basis for a computational theory of the human visual ability to 
perceive the shape of a smooth moving surface from its motion alone. 

Some disclaimers are in order. First, only arguments for the sufficiency of this approach, not its 
necessity, have been suggested. Alternative computational theories are available, some of which were 
discussed earlier. It is a matter for empirical investigation to determine which, if any, of the current 
theories is to some extent psychologically real. 

Two pieces of psychophysical evidence may be adduced to suggest the greater psychological reality 
of the present approach over previous ones. First are Ullman’s (1979) experiments, mentioned before, 
which indicate that only orthographic, not perspective, information seems to be utilized by the visual 
system in recovering surface shapes. The second is that the resulting perceptual effect (illustrated 
in figure 1) is of a smooth surface as opposed to isolated points connected by invisible wires. This 
suggests greater psychological reality for an approach which builds a description whose primitives 
relate to surfaces. 

A second disclaimer must be mentioned. The visual system may utilize additional generally valid 
constraints for the interpretation of surface shapes from motion. For example, shortcuts in computing 
flic slant, o, might be based on noting that a must be 90 degrees at external occluding contours 
and must vary smoothly between them. Another potentially powerful constraint is that the tilt field 
mtist be locally orthogonal to the image of its occluding contour (for smooth surfaces). Thus further 
investigation of valid means to reduce the computational complexity of this approach is warranted 
before serious claims for its psychological reality can be sustained. 


6. Computing Velocity Fields 

To this point the analysis has assumed the velocity field and its first spatial derivatives are given. 13 
Clearly this is not the case for a real observer. The real observer is presented with a temporally 
changing optic array. If a velocity field is required it must be constructed from the changing optic 
array. 

The problem of computing a velocity field has remained nontrivial despite much recent research. 
One can show that the motion information available at any single point in a changing optic array is 
insufficient to uniquely determine the velocity field at that point. Consequently much of the research 
in the field of optical flow has been devoted to discovering valid means of integrating motion informa¬ 
tion from local neighborhoods to uniquely determine the flow at each point in the neighborhood. 

A detailed analysis of the problem of determining optical flow is presented in Horn and Schunck 
(1980), which also includes a representative list of references on the topic. 

] Actually only the first spatial derivatives have been used. 
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du/dx ~ [(«3 — U 2 ) -f (uj — uo)]/2d 
dv/dx ~ [(i> 3 — u 2 ) + (vi — t^)]/2d 


^u/<5y ~ [(u 2 - u*,) + (u 3 - U!)]/2d 
<5t;/<9y ~ [(i> 2 — t^) + _ Wl )]/2rf 


Figure 6. The orthographically projected velocity field and its first spatial derivatives, (a) illustrates 
the decomposition of a velocity vector at a point in the field into its x component, u, and its y 
component, v. (b) illustrates how the spatial derivatives u x , u y , v x and v y can be approximated at 
some point, p, from the local velocity field. 
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5.2 Remarks on lemma 1. 

An important problem for the computational investigation of early vision is the initial carving up of 
the visual array into tentative objects and a background. This is important because it is a fundamental 
contention of tire bottom up computational approach that there exist autonomous low level visual 
processes capable of providing a useful initial segregation of tire visual world independent of higher 
level cognitive influences. For example, it is a primary goal of the primal sketch and 2-^D sketch, 
early visual representations proposed by Marr (1976) and Marr and Nishihara (1978), to make explicit 
exactly that information in a visual image which is required to build useful descriptions of the image 
in terms of objects and their relations. The processes proposed both to build and to operate upon 
these representations arc invariably bottom up. If this endeavor fails, so too does much of the com¬ 
putational approach to vision. Therefore a high priority activity of computational research in vision is 
to provide convincing existence proofs (e.g., running computer programs) for this contention. 

Visual motion seems a likely candidate base for tentative structuring of the visual array via 
autonomous processes. This has been suggested many times before. Ullman (1979. p. 76) proposes 
that a primitive motion correspondence process might be causally related to the child’s acquisition 
of object constancy over changing views of an object. Marr and Ullman (1979) have suggested that 
retinal velocity fields may be used to segregate the visual world by exploiting the “principle of con¬ 
tinuous How”. This principle states that “the velocity field of motion within the image of a rigid object 
varies continuously almost everywhere.” 

The results of lemma 1 suggest four further motion based segregation methods. The first two arise 
again from the fact that a rigid body can have but one angular velocity at any instant. Since lemma 1 
provides methods to compute u> 3 and 04/^2 locally, it is possible to segregate the field into regions of 
constant w 3 and constant wi/Wj- In fact, tire segregations obtained by the two methods should agree, 
providing the necessary redundancy to check for gross errors. 

The third method is based on noting that the discriminant of equation (16) for u> 3 remains real over 
regions in the image which arc the projections of smooth rigid surfaces. 9 Therefore points where loj 
becomes complex indicate regions in the image where tire rigidity constraint is violated (or where the 
surface has a discontinuity from the current viewpoint). 

Finally, we can utilize constraints on tilt fields. For smooth rigid surfaces a “principle of continuous 
tilt” analogous to that proposed by Marr and Ullman for optical flow may be invoked to segregate 
the visual array. This principle states that “the tilt field within the image of a rigid object varies 
continuously almost everywhere.” 

These four segregation techniques are not isomorphic to Marr and Ullman’s principle of con¬ 
tinuous flow. The methods suggested here segregate the image into regions which arc the projections 
of rigid objects. The principle of continuous flow cannot. Since it does not explicitly incorporate a 
rigidity constraint, 10 the principle of continuous flow cannot be used to distinguish regions of smooth 

Tfhis is easily proved by substituting from equations (12)-(JS) into the appropriate terms of the discriminant of (16). 
Simplifying gives (ui 2 z y + which is always greater than zero. Implicit in equations (12>-(15) is the rigidity 

assumption. 

,0 The word rigid does appear in the statement of their principle, but it is equally true that “the velocity field of motion 
within the image of a bending object varies continuously almost everywhere.” 
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flow in the image which arise from rigid objects from those which arise from bending or otherwise 
non-rigid substances. Consequently the segregations provided by the different methods are not identi¬ 
cal but are useful for different purposes. 


5.3 Lemma 2. 

The surface slant, o, and the remaining two components of the angular velocity, uj { and are 
computable given the spatial derivatives of the orlhographically projected acceleration field in addition to 
those of the velocity field. 

Proof of Lemma 2. The acceleration field associated with a smooth rigid surface is found by taking 
the time derivative of equation (3). Indicating temporal derivatives by primes we have: 

V' = fl'xR + fiXR' + f (19) 

where 

R' — f2 X R = V (20) 

$l! — u/j i -j- u/ 2 j “I - u>' 3 k (21) 

If we take the first spatial derivatives of (19), simplify the results using the rigidity constraint of (7), 
and expand the indicated cross products as before, we obtain the four equations: 


u' x = U)' 2 Z X — U >2 — w If- W3W1 Zx 

(22) 

u 'y — w 2 z y ~ w 3 + w l w 2 + UJi UtjZy 

(23) 

V 'x = — U\z x + U>iO> 3 Z x -f WJW 2 

(24) 

Vy = —u\Zy + C OiU^Zy — Ojj—wj 

(25) 


Equations (12)—(15) and (22)—(25) relate eight quantities measurable in principle from the image, 
( u x , u y , v x , v v , u' x , v! y , v' x , v' y ), to the eight unknowns of interest: the local surface normal, (cr, r), 
the three components of the angular velocity, (u»i, u wfi, and the three components of the angular 
acceleration, (u/j, u/ 2 , u/ 3 ). The simple fact that we have eight equations in eight unknowns does not 
necessarily imply that this system has but a finite number of solutions. To ascertain if there arc a finite 
number of solutions we apply the inverse function theorem. 11 This theorem allows us to conclude 

11 For an informal discussion of the utility of the inverse function theorem, Bezout’s theorem, and Sard’s theorem for 
problems involving systems of nonlinear equations see Richards, Rubin, and IlolTman (1981). 
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that wherever the Jacobian of these equations is nonsingular the mapping defined by the equations is 
locally one to one and onto (ie, a local diffeomorphism). Consequently any roots at points where the 
Jacobian is nonsingular are isolated and not part of a continuum of solutions. The determinant of the 
Jacobian of (12)—(15) and (22)—(25) is: 


U>2 

0 

0 


0 

0 

0 

0 

—U)! 

0 


0 

1 

0 

0 

0 

0 

U>2 

0 

z v 

-1 

0 

0 

0 

0 

—Wi 

—Zy 

0 

0 

0 

0 

0 

UqW 3 + b/ 2 

0 

U)3Z X 

— 2ut2 

uq Zx — 2u; 3 

0 

2 * 

0 

0 

uquq + u)' 2 

W 3 Zy + 0J2 

wi 

UllZy 

0 

Zy 

— 1 

W 2 W 3 — <4 

0 


(jU 3 Z x + bq 

z x 

—Zt 

0 

1 

0 

U>2W3 — U)\ 

—2uq 

U3 Zy 

W 2 Zy — 2uq 

-Zy 

0 

0 


This Jacobian has rank eight. Consequently the system of equations has but a finite set of solutions 
in general. 12 By Bc/.out’s theorem 11 we know that the sum of the multiplicities of the solutions does 
not exceed the product of the degrees of the equations. 

We have shown that there are but a finite number of solutions given the spatial derivatives of the 
velocity and acceleration fields at one point. In fact (12)—(15) and (22)—(25) can be solved uniquely (up 
to a reflection) for a, uq, and u>i in terms of u/ 3 : 



(26) 


(27) 


(28) 


where 

a = (u/ 3 + u' y )« — b/ 3 ) — « -f u$(w£ + v' y ) 


(29) 


/? = (w 3 — V X )\J\ + u' x ) + u x { v' x + u' y )(u>3 — V x ) + u 2 x (ui l + v' y ) (30) 

7 = («3 + v 'y)^ + u v? - v y( u 'y + 0(^3 + Uy) + vj(w| -f <) 


This concludes the proof of lemma 2 and of the shape from motion proposition. 

1 Regenerate conditions can be found by determining when this determinant is zero. 


(31) 
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5.4 Remarks on the Shape From Motion Proposition 

It has been shown that the angular velocity and the surface normal at each visible point of a rigidly 
rotating regular surface are uniquely determined up to a reflection if one is given the orthographic 
projection and first spatial derivatives of the associated velocity and acceleration fields. This proposi¬ 
tion and its proof are proposed as the basis for a computational theory of the human visual ability to 
perceive the shape of a smooth moving surface from its motion alone. 

Some disclaimers are in order. First, only arguments for the sufficiency of this approach, not its 
necessity, have been suggested. Alternative computational theories are available, some of which were 
discussed earlier. It is a matter for empirical investigation to determine which, if any, of the current 
theories is to some extent psychologically real. 

Two pieces of psychophysical evidence may be adduced to suggest the greater psychological reality 
of the present approach over previous ones. First are Ullman’s (1979) experiments, mentioned before, 
which indicate that only orthographic, not perspective, information seems to be utilized by the visual 
system in recovering surface shapes. The second is that the resulting perceptual effect (illustrated 
in figure 1) is of a smooth surface as opposed to isolated points connected by invisible wires. This 
suggests greater psychological reality for an approach which builds a description whose primitives 
relate to surfaces. 

A second disclaimer must be mentioned. The visual system may utilize additional generally valid 
constraints for the interpretation of surface shapes from motion. For example, shortcuts in computing 
flic slant, o, might be based on noting that a must be 90 degrees at external occluding contours 
and must vary smoothly between them. Another potentially powerful constraint is that the tilt field 
mtist be locally orthogonal to the image of its occluding contour (for smooth surfaces). Thus further 
investigation of valid means to reduce the computational complexity of this approach is warranted 
before serious claims for its psychological reality can be sustained. 


6. Computing Velocity Fields 

To this point the analysis has assumed the velocity field and its first spatial derivatives are given. 13 
Clearly this is not the case for a real observer. The real observer is presented with a temporally 
changing optic array. If a velocity field is required it must be constructed from the changing optic 
array. 

The problem of computing a velocity field has remained nontrivial despite much recent research. 
One can show that the motion information available at any single point in a changing optic array is 
insufficient to uniquely determine the velocity field at that point. Consequently much of the research 
in the field of optical flow has been devoted to discovering valid means of integrating motion informa¬ 
tion from local neighborhoods to uniquely determine the flow at each point in the neighborhood. 

A detailed analysis of the problem of determining optical flow is presented in Horn and Schunck 
(1980), which also includes a representative list of references on the topic. 

] Actually only the first spatial derivatives have been used. 
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7. Summary 

A computational analysis of the human visual ability to infer surface shapes entirely from their mo¬ 
tion has been presented. The analysis proceeded in three main steps. First it was shown that surface 
tilt, r, and the component of angular velocity orthogonal to the image plane, W 3 , may be derived from 
just the spatial derivatives of the velocity field (assuming orthographic projection). Then it was shown 
that surface slant, o, and the two components of angular velocity lying parallel to the image plane, u>i 
andu> 2 , are computable if tire first spatial derivatives of the acceleration field are also available. Finally 
the problem of computing velocity fields from changing optic arrays was discussed briefly. 
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